The models we’ve considered so far have generally assumed structural stability - that the intercepts and slopes are shared by all groups in the data.
We’ll consider two possibilities:
groups in the data have different intercepts - test this with group indicator variables (dummies).
groups in the data have different slopes with respect to the same variables - test this using multiplicative interactions
These are not mutually exclusive - many models employ both.
Indicator Variables
Suppose the following regression:
\[Y=\beta_0+\beta_1(D)+\varepsilon \]
where \(D\) is an indicator variable, and \(y\) is a continuous variable. Where \(D\) is constructed just to distinguish between two groups, say, \(D=1\) for the US South and \(D=0\) otherwise, we usually call these dummy variables.
Indicator Variables
Now, consider the conditional expected values of \(Y\):
\[E[Y|D_1=0]= \beta_0 = \bar{Y} |D=0 \nonumber \]
\[E[Y|D_1=1]= \beta_0 + \beta_1 = \bar{Y}|D=1 \]
So these are just the conditional means of \(Y\); the conditions are given by the values of \(D\) and represent the subsamples for \(D=0\) and \(D=1\).
Differential Intercept
The dummy variable estimate represents the differential intercept, or the difference in the mean of \(Y|D=0\) and the mean of \(Y|D=1\). The estimate \(\beta_1\) is the difference in height on the \(y\) axis between the two groups in \(D\). The actual intercepts are:
\[Y|D=0 = \beta_0\]
\[Y|D=1 = \beta_0+\beta_1\]
code
## differential intercepts ----set.seed(12345)data <-tibble(X <-rnorm_multi(1000, 3, mu=c(0, 5, 0), sd=c(1, 5, .5),r =c(0,0, 0.0),varnames=c("x1", "x2", "e")) ) %>%mutate(x2=ifelse(x2>median(x2),1,0)) %>%mutate(y =1+1*x1 +2*x2 + e) m1 <- (lm(y ~ x1 +as.factor(x2), data=data))### intercept differences only ----data1 <- data %>%mutate(x1=0) predint <-data.frame(data, predict(m1, interval="confidence", se.fit=TRUE, newdata = data1))predint <- predint %>%mutate(ub=fit.fit +1.96*se.fit, lb=fit.fit -1.96*se.fit) #absurd CI for illustration ggplot(data=predint, aes(x=x2, y=fit.fit)) +geom_point(size =1) +geom_pointrange(aes(ymin=lb, ymax=ub)) +scale_x_continuous(breaks =c(0,1)) +labs ( colour =NULL, x ="x2", y ="Predicted xb" ) +theme_minimal()+ggtitle("Differential Intercepts") +annotate("text", x = .1, y =1.5, label ="B0" , size=3.5, colour="gray30") +annotate("text", x = .9, y =3.3, label ="B0 + B1", size=3.5, colour="gray30")
Differential Intercept
Note this logic extends to multiple indicator variables where \(\beta_0\) measures the intercept for the excluded category (all indicators set to zero), and \(\beta_0+\beta_1\) measures the intercept for any other category where its indicator is set to one, all others to zero.
Multiple Indicator Variables
Thinking of the Ill-Treatment models we’ve been working with:
where \(D_{1}=1\) indicates a civil war, and \(D_{2}=1\) indicates the government restricts IGO access:
\(E[Y]\) for a non civil war, no restriction country is \(\beta_{0}\).
\(E[Y]\) for a civil war state with no restrictions is \(\beta_{0}+\beta_{1}\)
\(E[Y]\) for a non civil war state with restricted access is \(\beta_{0}+\beta_{2}\)
\(E[Y]\) for a civil war state that restricts access is \(\beta_{0}+\beta_{1}+\beta_{2}\)
code
itt <-read.csv("/Users/dave/Documents/teaching/501/2023/exercises/ex4/ITT/data/ITT.csv")itt <- itt%>%group_by(ccode) %>%mutate( lagprotest=lag(protest), lagRA=lag(RstrctAccess), n=1) %>%mutate(interact= civilwar*lagRA)m3 <-lm(scarring ~ lagRA + civilwar + lagprotest + polity2 +wdi_gdpc+wdi_pop, data=itt)# summary(m1)# average predictions, end point boundaries# estimation sampleitt$used <-TRUEitt$used[na.action(m3)] <-FALSEittesample <- itt %>%filter(used=="TRUE")#across 4 combos of RA and CWpredictions <-data.frame(case=seq(1,4,1), xb=0, se=0, ub=0, lb=0, model=0 )avg <- ittesample %>%mutate(civilwar=0, lagRA=0 )all0 <-data.frame(predict(m3, interval="confidence", se.fit=TRUE, newdata = avg))predictions[1:1,2:3]<-data.frame(xb=median(all0$fit.fit, na.rm =TRUE), seall0=median(all0$se.fit,na.rm =TRUE))avg <- ittesample %>%mutate(civilwar=1, lagRA=0 )cw <-data.frame(predict(m3, interval="confidence", se.fit=TRUE, newdata = avg))predictions[2:2,2:3]<-data.frame(xb=median(cw$fit.fit, na.rm =TRUE), seall0=median(cw$se.fit,na.rm =TRUE))avg <- ittesample %>%mutate(civilwar=0, lagRA=1 )ra <-data.frame(predict(m3, interval="confidence", se.fit=TRUE, newdata = avg))predictions[3:3,2:3]<-data.frame(xb=median(ra$fit.fit, na.rm =TRUE), seall0=median(ra$se.fit,na.rm =TRUE))avg <- ittesample %>%mutate(civilwar=1, lagRA=1 )all1 <-data.frame(predict(m3, interval="confidence", se.fit=TRUE, newdata = avg))predictions[4:4,2:3]<-data.frame(xb=median(all1$fit.fit, na.rm =TRUE), seall0=median(all1$se.fit,na.rm =TRUE))predictions <- predictions %>%mutate(ub=xb +1.96*se , lb =xb -1.96*se)#plotggplot(data=predictions, aes(x=case, y=xb)) +geom_pointrange(data=predictions, aes(ymin=lb, ymax=ub)) +labs ( colour =NULL, x ="", y ="Expected Scarring Torture Reports" ) +guides(x="none")+annotate("text", x =1.6, y =5, label ="No restriction, no civil war", size=3.5, colour="gray30")+annotate("text", x =2, y =9.5, label ="No restriction, civil war", size=3.5, colour="gray30")+annotate("text", x =3, y =13, label ="Restriction, no civil war", size=3.5, colour="gray30")+annotate("text", x =3.2, y =22, label ="Restriction and civil war", size=3.5, colour="gray30")+theme_minimal()
Fixed Effects
Let’s take this notion of differential intercepts one final step. Suppose we estimate a model of Ill-Treatment and include a dummy variable for each country (minus one for the excluded category).
code
# fixed effects ----itt$c <-as.factor(itt$ctryname)itt <- itt %>%filter(!is.na(c))# set ref category to US# itt <- within(itt, cname <- relevel(cname, ref = "United States of America"))# be sure NA and "" are not categories in factor var cnamem4 <-lm(scarring ~ c, data=itt%>%filter(c!=""))#modelsummary(m4)library(stargazer)stargazer(m4, type="html", single.row=TRUE, header=FALSE, digits=3, omit.stat=c("LL","ser"), star.cutoffs=c(0.05,0.01,0.001), column.labels=c("Fixed Effects"), dep.var.caption="Dependent Variable: Scarring Torture", dep.var.labels.include=FALSE, notes=c("Standard errors in parentheses", "Significance levels: *** p<0.001, ** p<0.01, * p<0.05"), notes.append =FALSE, align=TRUE, font.size="small")
Dependent Variable: Scarring Torture
Fixed Effects
cAlbania
4.717 (3.487)
cAlgeria
-3.283 (3.487)
cAngola
6.444 (3.487)
cArgentina
3.990 (3.487)
cArmenia
4.644 (3.565)
cAustralia
-4.156 (3.565)
cAustria
-0.556 (3.565)
cAzerbaijan
3.263 (3.487)
cBangladesh
1.717 (3.487)
cBelarus
0.990 (3.487)
cBelgium
1.172 (3.487)
cBenin
-4.556 (6.065)
cBolivia
0.778 (3.658)
cBosnia and Herzegovina
-4.306 (4.662)
cBrazil
18.626*** (3.487)
cBulgaria
9.626** (3.487)
cBurkina Faso
-3.413 (3.910)
cBurundi
1.444 (3.487)
cCambodia
-3.656 (3.565)
cCameroon
-0.456 (3.565)
cCanada
-4.181 (3.770)
cCentral African Republic
5.778 (5.173)
cChad
-0.856 (3.565)
cChile
-0.856 (3.565)
cChina
28.354*** (3.487)
cColombia
5.172 (3.487)
cCongo (Brazzaville, Republic of Congo)
-2.356 (3.565)
cCosta Rica
-3.889 (5.173)
cCote d’Ivoire
-2.374 (3.487)
cCroatia
0.144 (3.565)
cCuba
2.354 (3.487)
cCzech Republic
-0.756 (3.565)
cDemocratic Republic of the Congo (Zaire, Congo-Kinshasha)
Here’s a plot of the fixed effect coefficients - each is a differential intercept, measuring the difference from the reference category.
code
# reference category is AFG by default # coefficients as data frame for plottingcoefs<-data.frame(coef(summary(m4)))coefs$country <-rownames(coefs)coefs$country <-substr(coefs$country, 2, nchar(coefs$country))coefs$country[coefs$country=="Intercept)"] <-"Intercept"coefs$sig <-ifelse(coefs$`Pr...t..`<.05, 1,0)## plot fixed effects ----p <-ggplot(coefs, aes(x = Estimate, y = country, color =factor(sig), label = country)) +geom_point(size =1) +geom_errorbarh(aes(xmin = Estimate -1.96* Std..Error, xmax = Estimate +1.96* Std..Error)) +geom_text_repel(data=coefs%>%filter(sig==1), size=2.5, force=5) +theme(axis.text.y =element_blank(),axis.ticks.y =element_blank())+labs ( colour =NULL, y ="", x ="Expected Scarring Torture Reports" ) +ggtitle("Fixed Effects", subtitle ="Country Coefficients")p +scale_color_manual(values =c("red", "black")) +guides(color="none")
The result is that each coefficient is an intercept for a particular country. Each measures the difference in that country’s intercept (or mean of \(y\)) from the excluded category, in this case, Afghanistan. Black bars are different from the excluded category at the .05 level.
Each country coefficient plus the intercept is that country’s mean of \(y\), scarring torture . Looking, for instance at the US, the coefficient is about 19, so the US mean scarring torture reports is 19 plus the intercept (5.5), so about 24.5 (the actual mean for US scarring torture reports is 24.45).
Intercept Shifts with Continuous Variables
Suppose we have a regression with a dummy variable and a continuous variable:
\[y = \beta_0 + \beta_1 d_1 + \beta_2 x_2 \] The expected value of \(y\) for each group is:
Assuming a common slope is known as the structural stability assumption.
Dummy Variables
Dummy variables are useful for measuring differences between groups.
Dummy coefficients specifically measure the differences in levels (intercepts) between groups.
Dummies may capture differences between known, discrete groups, e.g. genders, parties, races, etc.
Dummies might capture unknown differences between units, say between states or countries - this is the foundation of fixed effects.
Questioning Structural Stability
We may well have reason to doubt structural stability. In other words, we might think the slope on \(x\) for one group is increasing fast, while it increases more slowly for the other group.
For instance, we might suppose that the effect of protests on scarring torture is different in states that restrict IGO access than in states that do not. States that do not restrict access may engage in more torture as protests increase, but more slowly than states that do restrict access so do not have to hide their activities. So restricting access may moderate the effect of protests on torture, whereas not restricting access to IGOs may accelerate that effect.
Structural Stability
If this is the case, then the marginal effect of \(x_2\) is not \(\beta_2\); instead, it is .5 for one group, and 1.7 for the other.
Recall, the marginal effect of \(x_2\) in the model where we assume structural stability:
\[y = \beta_0 + \beta_1 d_1 + \beta_2 x_2 \]
is \(\beta_2\).
Illustration
code
data <- data %>%mutate(interaction=x1*x2) %>%mutate(yi =1+1*x1 +2*x2 +1.5*interaction + e)m1 <- (lm(y ~ x1 + x2, data=data))m2 <- (lm(yi ~ x1 + x2 + x1:x2, data=data))#summary(m2) #using sjPlotp1 <-plot_model(m1, type="eff", terms=c("x1", "x2"), show.values =TRUE, show.p =TRUE, title="Differential Intercept") +labs ( colour =NULL, x ="x1", y ="Predicted xb" ) +guides(colour="none") +annotate("text", x =0, y =-1, label ="x2=0", size=2.5, colour="gray30")+annotate("text", x =0, y =5, label ="x2=1", size=2.5, colour="gray30")+theme_minimal()p2 <-plot_model(m2, type="eff", terms=c("x1", "x2"), show.values =TRUE, show.p =TRUE, title="Interaction, x1*x2") +labs ( colour =NULL, x ="x1", y ="Predicted xb" ) +guides(colour="none") +annotate("text", x =0, y =-2, label ="x2=0", size=2.5, colour="gray30")+annotate("text", x =0, y =5, label ="x2=1", size=2.5, colour="gray30")+theme_minimal()p1/p2
In the first panel, we see the differential intercepts - the slope on \(x_1\) is the same for both groups, but the intercepts are different. This is structural stability. The bottom panel relaxes the structural stability assumption, allowing the slope on \(x_1\) to differ between the two groups.
Structural Stability
We relax the structural stability assumption by modeling a multiplicative interaction:
literally multiplying \(d_1\) and \(x_2\) together, and including that new variable in the model. \(\beta_3\) measures the difference in slope between the two groups in \(d_1\).
Multiplicative Interactions
Let’s work with the ITT data for the sake of continuity. The basic model we’re working with is predicting reports of scarring torture. In general, my argument is that scarring torture is potentially a tool regimes use to signal to dissidents they should watch themselves. Insofar as violent repression is costly, regimes will use scarring torture reduce dissent, thereby making violent repression less necessary. So when protest frequency is higher, we should see more scarring torture as regimes attempt to quell dissent.1
Regime transparency will modify this relationship. Regimes that restrict access to IGOs will have lower costs for using violence (scarring torture), so should employ it at a higher rate than will regimes that do not restrict IGO access. This implies the following hypothesis:
\(H_1\): Protests will be positively related to scarring torture; their effect will be stronger in states that restrict IGO access than in states that do not.
Alternatively, we can state it this way:
\(H_1\): Increases in protests will produce faster increases in scarring torture in states that restrict IGO access than in states that do not.
You should note this implies two slopes - one for states that restrict IGO access, and one for states that do not. The slope effect for protests should be larger in states that restrict access.
Since we’re positing different slopes, we’re explicitly relaxing the structural stability assumption.
Structural Stability
In this regression:
\[y = \beta_0 + \beta_1 d_1 + \beta_2 x_2 \]
The slope on \(x_2\) is the same for both groups represented by \(d_1\). The model assumes structural stability by restricting both groups in \(d_1\) to have the same slope on \(x_2\).
Note that the marginal effect of \(x_2\) is \(\beta_2\) for both groups represented by \(d_1\).
We relax structural stability by adding a multiplicative interaction between \(d_1\) and \(x_2\). This allows the slope on \(x_2\) to differ between the two groups represented by \(d_1\). Note that whether or not the two groups have different slopes is now the matter of a hypothesis test.
The slope for \(x_2\) is \(\beta_2\) when \(d_1=0\) and \(\beta_2+\beta_3\) when \(d_1=1\). The interaction term, \(\beta_3\), measures the difference in slopes between the two groups represented by \(d_1\).
Note that interpretation of \(x_2\) is now conditional on \(d_1\) - we cannot interpret the effect of \(x_2\) without reference to \(d_1\).
Estimating the model
Brambor, Clark, and Golder (2006) provide a great discussion of multiplicative interactions in regression models. Berry, Golder, and Milton (2012) provide further developments and particularly focus on hypotheses to test from interaction models. Recently, Clark and Golder (2023) present a comprehensive treatment of interactions in regression models (along with complete replication code). Resources for the book and articles are available on Matt Golder’s website.
Basics of interactions
A regression with the multiplicative interaction of two variables, \(x_1\) and \(x_2\):
includes \(x_1\) and \(x_2\); these are called constituent terms of the interaction. Always include the constituents.
includes a new variable \(x_1 * x_2\); this is the interaction term.
includes whatever other variables per usual.
interpretation is always conditional, so we cannot interpret the effects of the constituents or interaction term without reference to the other.
Here, the expected value of \(y\) depends on the intercept given by \(\beta_0\) and \(\beta_1\), and on slopes determined by \(x_2\), but those slopes differ depending on the value of \(d_1\).
Marginal Effects
An important insight: the interaction coefficient, \(\beta_3\), measures the difference in slopes between the groups represented by \(d=0\) and \(d=1\).
This means the marginal effect of \(x_2\) is not just \(\beta_2\), but \(\beta_2+\beta_3\).
Example
Let’s look at the scarring torture model interacting protests and restricted access. The model looks like this:
\[scarring = \beta_0 + \beta_1 protests + \beta_2 restricted + \beta_3 (protests*restricted) + X\beta\] where \(X\) is a matrix of control variables.
Let’s look at the estimates and think about what we can and cannot say. Recall the estimates on protests and restricted access are now conditional on the other, so we cannot make unconditional statements. We can say the effect of protests on scarring torture is 0.13 when access is restricted, and restricted access on scarring torture is about 8.5 cases when there are zero protests.
There is no effect of protests on scarring torture when restricted access is zero. When access is restricted, the effect of protests on scarring torture is \(\beta_1+\beta_3\) (.14+.78 = .92), so the sum of the uncondtional slope for protests and the interaction slope which represents the difference in slope effect for protests between restricted and unrestricted access.
Inference
To say whether this is different from zero, we need to compute a standard error for the sum of these two coefficients. We can’t just add the two standard errors together, but constructing one from the variance-covariance matrix of \(\beta\) is easy:
\[ se(\beta_{1}+ \beta_{3})= \sqrt{var(\beta_{1})+ X_{2}^{2}var(\beta_{3})+2X_{2}cov(\beta_{1},\beta_{3})} \] This how we’d compute the standard error for the sum of the two coefficients. Golder and his colleagues provide guidance for computing standard errors for a variety of models including two-way interactions (as above), three way interactions, and for models with quadratic terms - see the two figures below, both from Golder’s website
Quantities of Interest
linear predictions - computed as usual (using any of the methods we’ve learned, e.g, at-mean effects, average effects, simulated effects), this is a principle quantity of interest.
We might also consider the marginal effects of changes in one variable on the expected value of \(y\). The marginal effect is simply the change in \(y\) given a change in \(x\).
In the non-interactive model, the marginal effect of \(x_1\) is:
Note this depends on the values of \(x_2\), allowing for the possibility the effect of \(x_2\) accelerates or slows.
Interactions in regression models are symmetric in the sense that we can be interested in the effect of \(d_1\) on \(y\) given \(x_2\) or the effect of \(x_2\) on \(y\) given \(d_1\). Which you’re interested in is up to you and is a matter of theory.
The (symmetric) marginal effect of \(x_1\) on \(y\) given \(d_1\) is:
It should make sense that if \(d_1=0\), the effect is just \(\beta_2\). If \(d_1=1\), we adjust the slope by \(\beta_3\), thereby allowing the slopes to be different for the two groups in \(d_1\), relaxing structural stability. This is also a reminder that we can only interpret these coefficients conditional on one another.
Linear Predictions (Average Effects)
Here are predictions computed as average effects for the scarring torture model. Note we have predictions for two groups (restricted and unrestricted access), and their slopes are different.
code
# estimation sampleitt$used <-TRUEitt$used[na.action(m4)] <-FALSEittesample <- itt %>%filter(used=="TRUE")# loop over number of protestspred_data <-ittesampleprotests <-0medxbr0 <-0ubxbr0 <-0lbxbr0 <-0medxbr1 <-0ubxbr1 <-0lbxbr1 <-0pred_data$RstrctAccess <-0for(p inseq(1,40,1)) { pred_data$lagprotest <- p pred_data$interaction <-0 protests[p] <- p allpreds <-data.frame(predict(m4, interval="confidence", se.fit=TRUE, newdata = pred_data)) medxbr0[p] <-median(allpreds$fit.fit, na.rm=TRUE) ubxbr0[p] <-median(allpreds$fit.fit, na.rm=TRUE)+1.96*(median(allpreds$se.fit, na.rm=TRUE)) lbxbr0[p] <-median(allpreds$fit.fit, na.rm=TRUE)-1.96*(median(allpreds$se.fit, na.rm=TRUE))}pred_data$RstrctAccess <-1for(p inseq(1,40,1)) { pred_data$lagprotest <- p pred_data$interaction <- p allpreds <-data.frame(predict(m4, interval="confidence", se.fit=TRUE, newdata = pred_data)) medxbr1[p] <-median(allpreds$fit.fit, na.rm=TRUE) ubxbr1[p] <-median(allpreds$fit.fit, na.rm=TRUE)+1.96*(median(allpreds$se.fit, na.rm=TRUE)) lbxbr1[p] <-median(allpreds$fit.fit, na.rm=TRUE)-1.96*(median(allpreds$se.fit, na.rm=TRUE))}df <-data.frame(medxbr0, ubxbr0,lbxbr0,medxbr1, ubxbr1, lbxbr1, protests)#plottingggplot() +geom_ribbon(data=df, aes(x=protests, ymin=lbxbr0, ymax=ubxbr0),fill ="grey70", alpha = .4, ) +geom_ribbon(data=df, aes(x=protests, ymin=lbxbr1, ymax=ubxbr1), fill="grey60", alpha = .4, ) +geom_line(data= df, aes(x=protests, y=medxbr0))+geom_line(data= df, aes(x=protests, y=medxbr1))+labs ( colour =NULL, x ="Protests Against Government", y ="Expected Scarring Torture Reports" ) +annotate("text", x =8, y =30, label ="Restricted Access", size=3.5, colour="gray30")+annotate("text", x =8, y =11, label ="Unrestricted Access", size=3.5, colour="gray30")
As protests increase, we can see little change in scarring torture where IGO access is unrestricted. Transparency makes repressive acts harder to execute. On the other hand, where IGO access is restricted, scarring torture increases at considerably faster rate as protests increase. The confidence bands do not overlap, suggesting the slopes are different for the two groups.
Marginal Effects
We can also compute the marginal effects (as above). Let’s look at the marginal effect of restricting access to IGOs on scarring torture as protests increase.2
In terms of the regression above, we’re computing:
df <- df %>%mutate(protest=seq(1,40,1),MEprotest=m4$coefficients[2]+m4$coefficients[4]*protest, se =sqrt(vcov(m4)[2,2]+vcov(m4)[4,4]*protest^2+2*vcov(m4)[2,4]*protest), ub=MEprotest+1.96*se, lb=MEprotest-1.96*se)ggplot() +geom_line(data=df, aes(x=protest, y=MEprotest) ) +geom_ribbon(data=df, aes(x=protest, ymin=lb, ymax=ub),fill ="grey70", alpha = .4, ) +labs( x ="Protests", y ="Marginal Effect of Access Restriction on Scarring Torture")
In this plot, the marginal effect of restricting IGO access over the range of protests is positive, and increasing - it is above zero, and it has a positive slope. These are not redundant statements. The marginal effect could be negative, but increasing, or positive but decreasing. It’s important to idenitfy the location (above or below zero, or not different from zero), and its slope (positive, negative, zero).
One last observation (shown in in the simulated data earlier) is that the marginal effect is the difference between the two lines in the linear prediction plot above. The marginal effect is the difference between the two groups (restricted, unrestricted) over the values of protests.
References
Berry, William D, Matt Golder, and Daniel Milton. 2012. “Improving Tests of Theories Positing Interaction.”Journal of Politics 74 (3): 653–71.
Brambor, T., W. R. Clark, and M. Golder. 2006. “Understanding Interaction Models: Improving Empirical Analyses.”Political Analysis 14 (1): 63–82.
Clark, William Roberts, and Matt Golder. 2023. Interaction Models: Specification and Interpretation. Cambridge University Press.
It should be evident my argument implies both that protests influence scarring torture and that torture shapes protests; this simultaneity, if it exists, means the estimates are both biased an inefficient, and we should find a way to model this interesting source endogeneity. We’ll do so in week 15.↩︎
Since restricted access is binary, this is really a first difference rather than a marginal effect.↩︎